Study Overview:
The primary objective of this study is to assess the impact of Clionid
sponges on coral reef ecosystems in the U.S. Virgin Islands. Coral reefs
are critical for biodiversity and marine life, but have faced
significant declines due to environmental stressors, including climate
change, ocean acidification, and disease outbreaks. Clionid sponges, as
bioeroders, contribute to coral degradation by breaking down the calcium
carbonate structure of corals, accelerating reef decline. By examining
the prevalence and interactions of Cliona sponges with coral over time,
this research aims to understand how these sponges influence reef health
and resilience, particularly in response to environmental disturbances
such as bleaching events and hurricanes.
This study utilizes long-term data collected by the Territorial Coral Reef Monitoring Program (TCRMP) from 2009 to 2023 across 34 reef sites in the USVI. The research focuses on three main questions: (1) how Cliona cover and coral interactions have changed over time, (2) how shifts in coral abundance affect Cliona-coral interactions, and (3) whether Cliona presence correlates with coral health, particularly disease prevalence. Through statistical modeling, including generalized linear models (GLMs) and linear mixed-effects models (LMEMs), the study will analyze the relationship between Cliona prevalence and environmental variables such as temperature, water quality, and coral species. This analysis is intended to reveal patterns of Cliona recruitment on stressed or diseased corals and to identify species-specific vulnerabilities, providing insights into the adaptive mechanisms of Cliona and guiding reef conservation strategies in light of changing ocean conditions.
Code Explanation:
The R scripts provided manage and manipulate TCRMP data, preparing it
for statistical analysis. Key steps include filtering Cliona species
data, calculating prevalence, and integrating environmental factors.
Models such as Linear Mixed-Effects Models (LMEMs) and Generalized
Linear Models (GLMs) are applied to assess Cliona recruitment patterns,
coral interactions, and environmental influences.
Spencer Parr
Coral Health Data:
Coral health data is the base structure of the the TCRMP data
collection.
# Metadata table as a data frame
metadata <- data.frame(
Columns = c(
"Location", "SampleDate", "SampleYear", "SampleMonth", "Period", "SampleType", "Method",
"Recorder", "Transect", "SPP", "Size", "Interactions", "Predation", "Notes", "Damage",
"BLP", "BLII", "BL, P, VP, SP", "BLType", "BLCause", "CW", "CWLight", "CWDark",
"OldMort", "RecMort", "Disease", "NoLesion", "LesionPattern", "LesionEdge"
),
Description = c(
"Monitoring location at which data was collected",
"Date on which data was collected",
"Monitoring period in which data was collected; Note - Sample year may not always match the year from the SampleDate if a location was surveyed late in the annual monitoring period",
"Month in which data was collected; Note - when a location was not surveyed fully in the same month, the month in which sampling began is used as the sample month",
"Refers to the sample period in which the data falls; Annual, PeakBL, PostBL, SCTLD, WS, and Juvenile are the periods",
"Indicates the type of transect (random or permanent) from which data was collected",
"Indicates the way in which data was collected (e.g., line intercept, belt transect)",
"Diver who recorded the data",
"Transect number; Typically 6 transects per site, but additional transects may have been recorded",
"Coral identified to species; If not identifiable to species, genus is used",
"Coral colony size; Length (maximum planar diameter), Width (perpendicular to length), and Height (perpendicular to plane of growth)",
"Refers to other benthic entities interacting with living coral tissue; Includes macroalgae, sponges, cyanobacteria, etc.",
"Type and number of predators (e.g., damselfish, Coralliophila sp.)",
"Additional notes about the colony condition",
"Indicates the type of damage experienced by the colony (e.g., broken, overturned)",
"Indicates the bleaching status of a colony; Used before percent bleaching and paling were recorded",
"Indicates a secondary bleaching condition in addition to 'BLP'",
"Represents the proportion of the colony's tissue affected by bleaching or paling; VP and SP indicate intermediary stages",
"Refers to the type or location of bleaching on a colony (e.g., Focal, Medial, Granular)",
"Notes the cause of observed bleaching (e.g., under Lobophora sp.)",
"Represents the proportion of tissue at the light end of the Coral Watch colony color scale",
"Lightest colony living tissue color observed (Coral Watch scale: 1-6)",
"Darkest colony living tissue color observed (Coral Watch scale)",
"Proportion of the colony skeletal structure considered old partial mortality",
"Proportion of the colony skeletal structure considered recent partial mortality",
"Presence of Caribbean coral diseases; Proportion of the colony affected by lesions",
"Number of lesions present on the colony",
"Describes the location of rapid tissue loss diseases (e.g., at edges or center of colony)",
"Describes characteristics of the active lesion margin (e.g., smooth line, irregular pattern)"
)
)
# Create the table
metadata %>%
kbl(caption = "Metadata for Benthic Coral Health Data") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "left"
)
| Columns | Description |
|---|---|
| Location | Monitoring location at which data was collected |
| SampleDate | Date on which data was collected |
| SampleYear | Monitoring period in which data was collected; Note - Sample year may not always match the year from the SampleDate if a location was surveyed late in the annual monitoring period |
| SampleMonth | Month in which data was collected; Note - when a location was not surveyed fully in the same month, the month in which sampling began is used as the sample month |
| Period | Refers to the sample period in which the data falls; Annual, PeakBL, PostBL, SCTLD, WS, and Juvenile are the periods |
| SampleType | Indicates the type of transect (random or permanent) from which data was collected |
| Method | Indicates the way in which data was collected (e.g., line intercept, belt transect) |
| Recorder | Diver who recorded the data |
| Transect | Transect number; Typically 6 transects per site, but additional transects may have been recorded |
| SPP | Coral identified to species; If not identifiable to species, genus is used |
| Size | Coral colony size; Length (maximum planar diameter), Width (perpendicular to length), and Height (perpendicular to plane of growth) |
| Interactions | Refers to other benthic entities interacting with living coral tissue; Includes macroalgae, sponges, cyanobacteria, etc. |
| Predation | Type and number of predators (e.g., damselfish, Coralliophila sp.) |
| Notes | Additional notes about the colony condition |
| Damage | Indicates the type of damage experienced by the colony (e.g., broken, overturned) |
| BLP | Indicates the bleaching status of a colony; Used before percent bleaching and paling were recorded |
| BLII | Indicates a secondary bleaching condition in addition to ‘BLP’ |
| BL, P, VP, SP | Represents the proportion of the colony’s tissue affected by bleaching or paling; VP and SP indicate intermediary stages |
| BLType | Refers to the type or location of bleaching on a colony (e.g., Focal, Medial, Granular) |
| BLCause | Notes the cause of observed bleaching (e.g., under Lobophora sp.) |
| CW | Represents the proportion of tissue at the light end of the Coral Watch colony color scale |
| CWLight | Lightest colony living tissue color observed (Coral Watch scale: 1-6) |
| CWDark | Darkest colony living tissue color observed (Coral Watch scale) |
| OldMort | Proportion of the colony skeletal structure considered old partial mortality |
| RecMort | Proportion of the colony skeletal structure considered recent partial mortality |
| Disease | Presence of Caribbean coral diseases; Proportion of the colony affected by lesions |
| NoLesion | Number of lesions present on the colony |
| LesionPattern | Describes the location of rapid tissue loss diseases (e.g., at edges or center of colony) |
| LesionEdge | Describes characteristics of the active lesion margin (e.g., smooth line, irregular pattern) |
# Notes as a data frame
notes <- data.frame(
ID = 1:11,
Note = c(
"Transects are 10m in length with typically 6 per site, but additional transects may be present.",
"Prior to 2007 only colonies with a max planar diameter greater than 10 cm were assessed.",
"Prior to 2005, there were lapses in recording some variables.",
"Coral interactions have been recorded from 2009 onward.",
"Highlighted data has not been verified due to missing data sheets.",
"Use of CW, CWLight, and CWDark was phased out in September 2015.",
"Macroalgae previously identified as 'encrusting Lobophora sp.' was changed to Peyssonnelia sp in October 2015.",
"Ramicrusta sp was officially added as a macroalgae interaction in 2017.",
"Stony Coral Tissue Loss Disease (SCTLD) was first observed in the US Virgin Islands in January 2019.",
"Additional columns (NoLesion, LesionPattern, LesionEdge) were added in February 2020.",
"During the sample period 'Juvenile', specific transects were assessed for coral recruitment."
)
)
# Create the notes table
notes %>%
kbl(caption = "Additional Notes for Coral Health Metadata") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "left"
)
| ID | Note |
|---|---|
| 1 | Transects are 10m in length with typically 6 per site, but additional transects may be present. |
| 2 | Prior to 2007 only colonies with a max planar diameter greater than 10 cm were assessed. |
| 3 | Prior to 2005, there were lapses in recording some variables. |
| 4 | Coral interactions have been recorded from 2009 onward. |
| 5 | Highlighted data has not been verified due to missing data sheets. |
| 6 | Use of CW, CWLight, and CWDark was phased out in September 2015. |
| 7 | Macroalgae previously identified as ‘encrusting Lobophora sp.’ was changed to Peyssonnelia sp in October 2015. |
| 8 | Ramicrusta sp was officially added as a macroalgae interaction in 2017. |
| 9 | Stony Coral Tissue Loss Disease (SCTLD) was first observed in the US Virgin Islands in January 2019. |
| 10 | Additional columns (NoLesion, LesionPattern, LesionEdge) were added in February 2020. |
| 11 | During the sample period ‘Juvenile’, specific transects were assessed for coral recruitment. |
site_metadata <- data.frame(
Location = c(
"Coral Bay", "Fish Bay", "Meri Shoal", "Black Point", "Botany Bay", "Brewers Bay",
"Buck Island STT", "Coculus Rock", "College Shoal East", "Flat Cay", "Ginsburg Fringe",
"Grammanik Tiger FSA", "Hind Bank East FSA", "Magens Bay", "Ruperts Rock", "Savana",
"Seahorse Cottage Shoal", "South Capella", "South Water", "St James", "Buck Island STX",
"Buck Island STX Deep", "Cane Bay", "Cane Bay Deep", "Castle", "Eagle Ray", "Great Pond",
"Jacks Bay", "Kings Corner", "Lang Bank EEMP", "Lang Bank Red Hind FSA",
"Mutton Snapper FSA", "Salt River Deep", "Salt River West", "Sprat Hole"
),
Code = c(
"CRB", "FSB", "MSR", "BPT", "BTY", "BWR", "BIT", "CKR", "CSE", "FLC", "GBF",
"GKT", "HBE", "MGN", "RRK", "SVN", "SHR", "SCP", "SWT", "SSJ", "BIX", "BID",
"CBS", "CBD", "CST", "EGR", "GRP", "JKB", "KGC", "LBP", "LBH", "MTS", "SRD", "SRW", "SPH"
),
Island = c(
"STJ", "STJ", "STJ", "STT", "STT", "STT", "STT", "STT", "STT", "STT", "STT",
"STT", "STT", "STT", "STT", "STT", "STT", "STT", "STT", "STT", "STX", "STX",
"STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX", "STX"
),
Latitude = c(
18.33797, 18.31417, 18.24447, 18.34450, 18.35738, 18.34403, 18.27883, 18.31257, 18.18568,
18.31822, 18.18770, 18.18885, 18.20217, 18.37425, 18.327433, 18.34064, 18.29467, 18.26267,
18.28068, 18.29459, 17.78500, 17.80659, 17.77388, 17.77661, 17.76278, 17.76150, 17.71097,
17.74337, 17.69116, 17.72145, 17.82427, 17.63660, 17.78523, 17.78530, 17.73400
),
Longitude = c(
-64.70402, -64.76408, -64.75862, -64.98595, -65.03442, -64.98435, -64.89833, -64.86058,
-65.07677, -64.99104, -64.95998, -64.95659, -65.00158, -64.93438, -64.925967, -65.08205,
-64.86750, -64.87237, -64.94592, -64.83238, -64.60917, -64.59935, -64.81350, -64.81522,
-64.59743, -64.69880, -64.65221, -64.57160, -64.90008, -64.54706, -64.44963, -64.86240,
-64.75917, -64.75940, -64.89540
),
ReefComplex = c(
"Nearshore", "Nearshore", "Mesophotic", "Nearshore", "Nearshore", "Nearshore",
"Offshore", "Nearshore", "Mesophotic", "Offshore", "Mesophotic", "Mesophotic",
"Mesophotic", "Nearshore", "Nearshore", "Offshore", "Offshore", "Offshore",
"Offshore", "Offshore", "Offshore", "Mesophotic", "Nearshore", "Mesophotic",
"Offshore", "Offshore", "Nearshore", "Nearshore", "Nearshore", "Mesophotic",
"Mesophotic", "Offshore", "Mesophotic", "Nearshore", "Nearshore"
),
Depth = c(
9, 6, 30, 9, 8, 6, 14, 7, 30, 12, 63, 38, 39, 7, 5, 9, 20, 20, 20, 15, 15, 33,
10, 38, 7, 10, 6, 14, 17, 27, 33, 24, 30, 11, 8
),
YearAdded = c(
2011, 2002, 2005, 2003, 2002, 2002, 2005, 2002, 2003, 2003, 2011, 2003, 2003,
2002, 2022, 2003, 2003, 2003, 2005, 2005, 2002, 2017, 2002, 2009, 2003, 2002,
2003, 2002, 2007, 2009, 2002, 2003, 2009, 2002, 2002
)
)
# Create a styled table
site_metadata %>%
kbl(caption = "Survey Sites Metadata") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE,
position = "left"
)
| Location | Code | Island | Latitude | Longitude | ReefComplex | Depth | YearAdded |
|---|---|---|---|---|---|---|---|
| Coral Bay | CRB | STJ | 18.33797 | -64.70402 | Nearshore | 9 | 2011 |
| Fish Bay | FSB | STJ | 18.31417 | -64.76408 | Nearshore | 6 | 2002 |
| Meri Shoal | MSR | STJ | 18.24447 | -64.75862 | Mesophotic | 30 | 2005 |
| Black Point | BPT | STT | 18.34450 | -64.98595 | Nearshore | 9 | 2003 |
| Botany Bay | BTY | STT | 18.35738 | -65.03442 | Nearshore | 8 | 2002 |
| Brewers Bay | BWR | STT | 18.34403 | -64.98435 | Nearshore | 6 | 2002 |
| Buck Island STT | BIT | STT | 18.27883 | -64.89833 | Offshore | 14 | 2005 |
| Coculus Rock | CKR | STT | 18.31257 | -64.86058 | Nearshore | 7 | 2002 |
| College Shoal East | CSE | STT | 18.18568 | -65.07677 | Mesophotic | 30 | 2003 |
| Flat Cay | FLC | STT | 18.31822 | -64.99104 | Offshore | 12 | 2003 |
| Ginsburg Fringe | GBF | STT | 18.18770 | -64.95998 | Mesophotic | 63 | 2011 |
| Grammanik Tiger FSA | GKT | STT | 18.18885 | -64.95659 | Mesophotic | 38 | 2003 |
| Hind Bank East FSA | HBE | STT | 18.20217 | -65.00158 | Mesophotic | 39 | 2003 |
| Magens Bay | MGN | STT | 18.37425 | -64.93438 | Nearshore | 7 | 2002 |
| Ruperts Rock | RRK | STT | 18.32743 | -64.92597 | Nearshore | 5 | 2022 |
| Savana | SVN | STT | 18.34064 | -65.08205 | Offshore | 9 | 2003 |
| Seahorse Cottage Shoal | SHR | STT | 18.29467 | -64.86750 | Offshore | 20 | 2003 |
| South Capella | SCP | STT | 18.26267 | -64.87237 | Offshore | 20 | 2003 |
| South Water | SWT | STT | 18.28068 | -64.94592 | Offshore | 20 | 2005 |
| St James | SSJ | STT | 18.29459 | -64.83238 | Offshore | 15 | 2005 |
| Buck Island STX | BIX | STX | 17.78500 | -64.60917 | Offshore | 15 | 2002 |
| Buck Island STX Deep | BID | STX | 17.80659 | -64.59935 | Mesophotic | 33 | 2017 |
| Cane Bay | CBS | STX | 17.77388 | -64.81350 | Nearshore | 10 | 2002 |
| Cane Bay Deep | CBD | STX | 17.77661 | -64.81522 | Mesophotic | 38 | 2009 |
| Castle | CST | STX | 17.76278 | -64.59743 | Offshore | 7 | 2003 |
| Eagle Ray | EGR | STX | 17.76150 | -64.69880 | Offshore | 10 | 2002 |
| Great Pond | GRP | STX | 17.71097 | -64.65221 | Nearshore | 6 | 2003 |
| Jacks Bay | JKB | STX | 17.74337 | -64.57160 | Nearshore | 14 | 2002 |
| Kings Corner | KGC | STX | 17.69116 | -64.90008 | Nearshore | 17 | 2007 |
| Lang Bank EEMP | LBP | STX | 17.72145 | -64.54706 | Mesophotic | 27 | 2009 |
| Lang Bank Red Hind FSA | LBH | STX | 17.82427 | -64.44963 | Mesophotic | 33 | 2002 |
| Mutton Snapper FSA | MTS | STX | 17.63660 | -64.86240 | Offshore | 24 | 2003 |
| Salt River Deep | SRD | STX | 17.78523 | -64.75917 | Mesophotic | 30 | 2009 |
| Salt River West | SRW | STX | 17.78530 | -64.75940 | Nearshore | 11 | 2002 |
| Sprat Hole | SPH | STX | 17.73400 | -64.89540 | Nearshore | 8 | 2002 |
# Metadata as a data frame
benthic_cover_metadata <- data.frame(
Columns = c(
"SampleYear", "SampleMonth", "Period", "Location", "FilmDate",
"NoPts", "AnalysisBy", "AnalysisDate", "Transect", "BenthicCover", "Notes"
),
Description = c(
"Monitoring period in which data was collected; Sample year may not always match the year from the FilmDate if a location was surveyed late in the annual monitoring period.",
"Month in which data was collected; when a location was not surveyed fully in the same month, the month in which sampling began is used.",
"Refers to the sample period in which the data falls; Periods include Annual, PeakBL, PostBL, SCTLD, and WS.",
"Monitoring location at which data was collected.",
"Date on which transects were filmed.",
"Total number of random assignment points analyzed for the transect.",
"Individual who performed the benthic cover analysis.",
"Date on which benthic cover analysis was done.",
"Transect number; typically 6 transects per site, but additional transects may have been filmed.",
"All remaining columns refer to individual benthic cover categories. Values reported in 'BenthicData2017-PresRAW' are point totals, while values in 'BenthicData' are calculated benthic cover.",
"Additional notes pertaining to benthic cover."
)
)
# Create the table
benthic_cover_metadata %>%
kbl(caption = "TCRMP Benthic Cover Metadata") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "left"
)
| Columns | Description |
|---|---|
| SampleYear | Monitoring period in which data was collected; Sample year may not always match the year from the FilmDate if a location was surveyed late in the annual monitoring period. |
| SampleMonth | Month in which data was collected; when a location was not surveyed fully in the same month, the month in which sampling began is used. |
| Period | Refers to the sample period in which the data falls; Periods include Annual, PeakBL, PostBL, SCTLD, and WS. |
| Location | Monitoring location at which data was collected. |
| FilmDate | Date on which transects were filmed. |
| NoPts | Total number of random assignment points analyzed for the transect. |
| AnalysisBy | Individual who performed the benthic cover analysis. |
| AnalysisDate | Date on which benthic cover analysis was done. |
| Transect | Transect number; typically 6 transects per site, but additional transects may have been filmed. |
| BenthicCover | All remaining columns refer to individual benthic cover categories. Values reported in ‘BenthicData2017-PresRAW’ are point totals, while values in ‘BenthicData’ are calculated benthic cover. |
| Notes | Additional notes pertaining to benthic cover. |
# Notes as a data frame
benthic_cover_notes <- data.frame(
ID = 1:9,
Note = c(
"Benthic transects are the same ones on which coral health metrics are recorded. Each is 10m in length, typically 6 per site, though additional transects may have been recorded.",
"Coral Point Count with Excel Extensions (CPCe) software was used for analysis through 2018. After 2018, random point assignment was done using R Studio.",
"The number of points per image has varied throughout the monitoring program: 10 (2001-2011), 15 (2012-2013), and 20 (2014-present).",
"Rhodolith (RO) was added as a benthic cover category in 2011.",
"Peyssonnelia sp (PEY) was added as a benthic cover category in 2014.",
"Ramicrusta sp (RAMI) was added as a benthic cover category in 2017.",
"Data in red indicates values were unable to be verified during QAQC because the original analysis file was missing or corrupted.",
"Stony Coral Tissue Loss Disease (SCTLD) was added as a coral condition category after its arrival in the US Virgin Islands in 2019.",
"Cells containing 'ND' indicate that there is no data recorded for that benthic category."
)
)
# Create the notes table
benthic_cover_notes %>%
kbl(caption = "Additional Notes for TCRMP Benthic Cover Metadata") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "left"
)
| ID | Note |
|---|---|
| 1 | Benthic transects are the same ones on which coral health metrics are recorded. Each is 10m in length, typically 6 per site, though additional transects may have been recorded. |
| 2 | Coral Point Count with Excel Extensions (CPCe) software was used for analysis through 2018. After 2018, random point assignment was done using R Studio. |
| 3 | The number of points per image has varied throughout the monitoring program: 10 (2001-2011), 15 (2012-2013), and 20 (2014-present). |
| 4 | Rhodolith (RO) was added as a benthic cover category in 2011. |
| 5 | Peyssonnelia sp (PEY) was added as a benthic cover category in 2014. |
| 6 | Ramicrusta sp (RAMI) was added as a benthic cover category in 2017. |
| 7 | Data in red indicates values were unable to be verified during QAQC because the original analysis file was missing or corrupted. |
| 8 | Stony Coral Tissue Loss Disease (SCTLD) was added as a coral condition category after its arrival in the US Virgin Islands in 2019. |
| 9 | Cells containing ‘ND’ indicate that there is no data recorded for that benthic category. |
The data is read into R using read_csv, and then
selected columns are extracted, specifically: Period,
SampleType, Location, SampleYear,
Method, Transect, SPP,
Cliona, Spo1, Spo1ID,
Spo2ID, and Spo2. This selection reduces the
dataset to only the columns relevant to this analysis.
Next, a new column called presence is created to
indicate whether there is evidence of Cliona presence on each
sample. The mutate function is used along with
if_else to assign a value of 1 to
presence if any of the following conditions are met: the
Cliona value is greater than zero, Spo1ID or
Spo2ID columns contain either “CLSP” or “BOSP”. If none of
these conditions are met, presence is set to
0. After this, presence values are checked for
any missing data, and replace_na is used to fill any
NA values with 0, ensuring that
presence is either 1 or 0 without
any missing values.
The code then applies several filters to refine the dataset.
First, it filters for rows where Period is either “Annual”
or “PostBL”, SampleType is either “Permanent” or
“permanent”, and Method is either “intercept” or “50 cm
belt”. This step ensures that only specific types of samples are kept
for the analysis. Further, rows where Transect contains the
letter “A” are removed, ensuring that only certain transect data is
included. Finally, the code filters for rows where
SampleYear is 2005 or later, focusing on more recent
data.
In the last step, additional transformations are applied. A new
ID column is created to provide a unique identifier for
each row, generated as a sequence from 1 to the number of
rows in the filtered dataset. The SampleYear,
Location, and Transect columns are converted
to factor data types using as.factor to prepare them for
categorical analysis. This finalizes the dataset with all necessary
transformations, making it ready for further analysis or
modeling.
#1
cliona <- read_csv("../Cliona.csv")%>%
select(Period,SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, Spo2)%>%
#2
mutate(presence=if_else(Cliona>0 | Spo1ID %in% c("CLSP", "BOSP") | Spo2ID %in% c("CLSP", "BOSP"),1,0))%>%
mutate(presence=replace_na(presence,0))%>%
#3
filter(Period%in% c("Annual","PostBL")&SampleType%in% c("Permanent","permanent")&Method%in% c("intercept","50 cm belt"))%>%
filter(!grepl("A",Transect))%>%
filter(SampleYear>=2005)%>%
#4
mutate(ID=1:nrow(.),
SampleYear=as.factor(SampleYear),
Location=as.factor(Location),
Transect=as.factor(Transect))
cliona dataset by Location,
SampleYear, and Transect. Within each group,
it calculates three metrics: prev (the average presence of
Cliona, indicating prevalence), freq (total occurrences of
Cliona), and total (total observations in that group).
After ungrouping the data, it adds a unique ID to each row.
The resulting transectprev dataset provides summarized
prevalence data for each unique combination of location, year, and
transect.transectprev <- cliona %>%
group_by(Location, SampleYear, Transect) %>%
summarise(prev = mean(presence), freq = sum(presence), total = n())%>%
ungroup()%>%
mutate(ID= 1:nrow(.))
location_means,
showing the average Cliona sponge prevalence for each location.
It groups the transectprev data by Location
and then calculates the mean of prev (prevalence) within
each group, saving the result as mean_prev. The resulting
location_means data frame has each location’s name and its
corresponding average Cliona prevalence.location_means <- transectprev %>%
group_by(Location) %>%
summarise(mean_prev = mean(prev))
This code generates a plot that visualizes the prevalence of
Cliona sponges at different reef sites by creating a boxplot
for each location. The ggplot function uses the
transectprev dataset, which contains Cliona
prevalence data, and arranges the locations on the x-axis in descending
order of prevalence. Each boxplot represents the range and distribution
of Cliona prevalence values at that location. To enhance the
visualization, the code includes a stat_summary layer that
adds a green diamond marker at the mean prevalence for each location,
making it easy to compare average Cliona prevalence across
sites.
ggplot(transectprev, aes(x = reorder(Location, -prev), y = prev)) +
geom_boxplot(fill = "lightblue") +
stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "darkgreen") +
labs(title = "Prevalence of Cliona Sponges by Location (with Mean Prevalence Marked)",
x = "Location",
y = "Prevalence") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This plot is important because it reveals the variability in Cliona prevalence across different reef sites. Locations such as Salt River West, Brewers Bay, and Savana show significantly higher prevalence levels, as indicated by both the wider range and higher position of the boxplots and mean markers, compared to other sites with much lower prevalence. This variability suggests that certain sites may be more susceptible to Cliona sponge colonization, potentially due to environmental conditions or other factors specific to those areas.
# Convert SampleYear to numeric, if appropriate
# transectprev$SampleYear <- as.numeric(transectprev$SampleYear)
transectprevnum<-transectprev
transectprevnum$SampleYear <- as.numeric(transectprevnum$SampleYear)
# Then plot with the updated variable
ggplot(transectprevnum, aes(x = SampleYear, y = prev, color = factor(Transect), group = Transect)) +
geom_line() +
geom_point() +
facet_wrap(~Location, scales = "free_y") +
labs(title = "Prevalence Over Years by Transect",
x = "Year",
y = "Prevalence",
color = "Transect") +
scale_x_continuous(breaks = unique(transectprevnum$SampleYear), labels = unique(transectprevnum$SampleYear)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5))
This graph is valuable because it allows for temporal analysis of Cliona prevalence trends at multiple scales: across years, transects, and locations. By categegorising between the data by transect and location, the plot can reveal patterns, such as consistent increases or decreases in prevalence, or fluctuations in response to specific events or environmental changes. It could also be used to identify which transects or locations show higher susceptibility to Cliona over time.
Identifying Environmental Event Impact on Cliona Prevalence
envvariable <- transectprev %>%
mutate(Bleaching = if_else(SampleYear %in% c(2010, 2023), 1, 0),
Hurricane = if_else(SampleYear %in% c(2017, 2019), 1, 0),
SCTLD = if_else(SampleYear == 2019, 1, 0))
In order to run a linear mixed effects model, i need the average size of corals grouped by site and year. But 2003-2005 does not have any measurements in the width column. meaning i must cut 2003-2005 out of this section of analysis.
clionawidth<-raw_clio
clionawidth <- clionawidth[!is.na(clionawidth$Width), ]
coral_size_summary <- clionawidth %>%
group_by(Location, SampleYear, Transect) %>%
summarise(mean_volume = mean(Length * Width * Height, na.rm = TRUE),
mean_surface_area = mean(2 * (Length * Width + Width * Height + Height * Length), na.rm = TRUE),
.groups = "drop")
coral_size_summary <- coral_size_summary %>%
filter(!grepl("[A-Za-z]", Transect))
coral_size_summary <- coral_size_summary %>%
filter(!Transect %in% c("7", "8"))
coral_size_summary <- coral_size_summary %>%
mutate(SampleYear = as.factor(SampleYear))
envvariable<- envvariable %>%
mutate(SampleYear = as.factor(SampleYear))
envvariable <- envvariable %>%
left_join(coral_size_summary, by = c("Location", "SampleYear", "Transect"))
Accounts for random effects
#You're gonna want to add temperature to the model. but that is gonna take a second
envvariable <- envvariable %>%
mutate(mean_volume_scaled = scale(mean_volume))
lmem <- lmer(prev ~ Bleaching + Hurricane + SCTLD + mean_volume_scaled +
(1 | Location) + (1 | SampleYear),
data = envvariable)
summary(lmem)
## Linear mixed model fit by REML ['lmerMod']
## Formula: prev ~ Bleaching + Hurricane + SCTLD + mean_volume_scaled + (1 |
## Location) + (1 | SampleYear)
## Data: envvariable
##
## REML criterion at convergence: -7625.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.3206 -0.4851 -0.2305 0.1790 9.9318
##
## Random effects:
## Groups Name Variance Std.Dev.
## Location (Intercept) 0.0005761 0.02400
## SampleYear (Intercept) 0.0001662 0.01289
## Residual 0.0035541 0.05962
## Number of obs: 2778, groups: Location, 33; SampleYear, 17
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.032443 0.005580 5.814
## Bleaching 0.015383 0.014223 1.082
## Hurricane 0.003857 0.014090 0.274
## SCTLD -0.013429 0.019217 -0.699
## mean_volume_scaled 0.004516 0.001212 3.727
##
## Correlation of Fixed Effects:
## (Intr) Blchng Hurrcn SCTLD
## Bleaching -0.171
## Hurricane -0.174 0.068
## SCTLD 0.000 0.000 -0.683
## mn_vlm_scld -0.012 0.006 0.011 0.002
# Extract rows used in the model
model_data <- model.frame(lmem)
# Generate predictions
envvariable$predicted <- NA # Initialize a column for predictions
envvariable$predicted[as.numeric(rownames(model_data))] <- predict(lmem)
ggplot(envvariable, aes(x = predicted, y = prev)) +
geom_point(alpha = 0.5) +
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
labs(title = "Predicted vs Observed Cliona Prevalence",
x = "Predicted Prevalence",
y = "Observed Prevalence") +
theme_minimal()
The results of the linear mixed-effects model indicate that coral size, represented by the scaled variable mean_volume_scaled, is a significant predictor of Cliona sponge prevalence. For every one standard deviation increase in coral size, Cliona prevalence increases by approximately 0.0045 units, highlighting the importance of larger corals as suitable habitats for Cliona sponges. In contrast, environmental events such as bleaching, hurricanes, and SCTLD showed no significant effects on Cliona prevalence in this dataset, suggesting these disturbances may not be primary drivers of Cliona colonization patterns.
The random effects for location and year showed minimal variability, indicating that most of the observed trends in Cliona prevalence are explained by the fixed effects, particularly coral size. While the model provides a good fit to the data, the lack of significance for environmental predictors implies that other factors, beyond these disturbances, might better explain spatial and temporal variations in Cliona prevalence.
Does not include random effects, meaning all variability is attributed to the fixed effects
# Run a Generalized Linear Model (GLM)
glm_model <- glm(prev ~ Bleaching + Hurricane + SCTLD + mean_volume_scaled,
data = envvariable, family = quasibinomial)
summary(glm_model)
##
## Call:
## glm(formula = prev ~ Bleaching + Hurricane + SCTLD + mean_volume_scaled,
## family = quasibinomial, data = envvariable)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.32425 0.04043 -82.227 < 2e-16 ***
## Bleaching 0.34027 0.13519 2.517 0.0119 *
## Hurricane 0.05151 0.14105 0.365 0.7150
## SCTLD -0.49597 0.21618 -2.294 0.0219 *
## mean_volume_scaled 0.09563 0.01871 5.111 3.41e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.1220545)
##
## Null deviance: 265.71 on 2777 degrees of freedom
## Residual deviance: 261.60 on 2773 degrees of freedom
## (42 observations deleted due to missingness)
## AIC: NA
##
## Number of Fisher Scoring iterations: 6
# Extract rows used in the model
model_data <- model.frame(glm_model)
# Generate predicted probabilities
predicted_values <- predict(glm_model, type = "response")
# Add a predicted column with NA for unmatched rows
envvariable$predicted <- NA
envvariable$predicted[as.numeric(rownames(model_data))] <- predicted_values
# Plot predicted vs observed prevalence
ggplot(envvariable, aes(x = predicted, y = prev)) +
geom_point(alpha = 0.5) +
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") +
labs(title = "Predicted vs Observed Cliona Prevalence",
x = "Predicted Prevalence",
y = "Observed Prevalence") +
theme_minimal()
# Effect of Bleaching
ggplot(envvariable, aes(x = factor(Bleaching), y = predicted)) +
geom_boxplot() +
labs(title = "Effect of Bleaching on Predicted Cliona Prevalence",
x = "Bleaching Event (0 = No, 1 = Yes)",
y = "Predicted Prevalence") +
theme_minimal()
The GLM results indicate that both Bleaching and SCTLD significantly influence Cliona prevalence, while Hurricane does not. Specifically, bleaching events are associated with a significant increase in Cliona prevalence (Estimate=0.34027,𝑝=0.0119), suggesting that bleaching may create favorable conditions for Cliona colonization. Conversely, SCTLD has a significant negative association (Estimate=−0.49597,𝑝=0.0219), implying that disease outbreaks may reduce Cliona prevalence, possibly due to mortality or competitive dynamics. Coral size, represented by the scaled variable mean_volume_scaled, is also highly significant (Estimate=0.09563,𝑝=3.41×10 −7), with larger corals being more likely to host Cliona sponges. The intercept (Estimate=−3.32425, p<2×10−16) reflects the baseline log-odds of Cliona prevalence in the absence of environmental events and at the average coral size. The model explains a small reduction in deviance from the null model (Null Deviance = 265.71 to Residual Deviance = 261.60), suggesting moderate explanatory power. Overall, this analysis highlights coral size as the strongest predictor of Cliona prevalence, with environmental disturbances like bleaching and SCTLD also playing significant but contrasting roles.
ggplot(envvariable, aes(x = mean_volume_scaled)) + geom_histogram(binwidth = 0.5) + labs(title = "Histogram of Coral Volume (Scaled)")
## Warning: Removed 42 rows containing non-finite outside the scale range
## (`stat_bin()`).
ggplot(envvariable, aes(x = prev)) + geom_histogram(binwidth = 0.02) + labs(title = "Histogram of Cliona Prevalence")
Compare the fixed effects (coefficients) for predictors like Bleaching, SCTLD, and mean_volume_scaled in both models to check for consistency
# Extract coefficients for GLM
glm_coef <- coef(summary(glm_model))
# Extract coefficients for LME
lme_coef <- fixef(lmem)
# Print side-by-side comparison
glm_coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.32424753 0.04042782 -82.2267336 0.000000e+00
## Bleaching 0.34026517 0.13518526 2.5170287 1.189088e-02
## Hurricane 0.05151373 0.14104916 0.3652183 7.149763e-01
## SCTLD -0.49596527 0.21618193 -2.2942031 2.185325e-02
## mean_volume_scaled 0.09563155 0.01870914 5.1114889 3.413617e-07
lme_coef
## (Intercept) Bleaching Hurricane SCTLD
## 0.032443453 0.015383190 0.003856983 -0.013428680
## mean_volume_scaled
## 0.004516402
Examine the residuals to ensure the models fit the data
GLM Residuals
plot(glm_model, which = 1) # Residuals vs. Fitted
LME Residuals
plot(lmem) # Residuals for LME
qqnorm(resid(lmem)) # Q-Q plot
qqline(resid(lmem))
Spencer Parr
#Subset into a data frame that only has SPP and Cliona presence
cliona_subset <- cliona %>% select(SPP, presence)
# Count the total observations per coral species
species_counts <- cliona_subset %>%
group_by(SPP) %>%
summarise(total_observations = n()) %>%
arrange(desc(total_observations))
# Calculate the proportion of observations per species
species_proportion <- species_counts %>%
mutate(proportion_observed = total_observations / sum(total_observations))
#Count the Total Cliona Occurrences Per Species
cliona_presence_counts <- cliona_subset %>%
group_by(SPP) %>%
summarise(cliona_present = sum(presence, na.rm = TRUE))
# Merge with total species observations
species_cliona_prob <- species_counts %>%
left_join(cliona_presence_counts, by = "SPP") %>%
mutate(cliona_probability = cliona_present / total_observations)
The dataset contains observations of Cliona prevalence across multiple coral species. However, some species may have very few observations, which makes statistical inference unreliable. If a species has too few observations, we might miss detecting Cliona even if it is present which is a Type II error (failing to reject a false null hypothesis). If a species has sufficient observations, we can be statistically confident in estimating Cliona prevalence. Thus, power analysis helps determine how many observations are needed per species to ensure reliable estimates of Cliona prevalence.
\[ h = 2 \times \left( \asin \left( \sqrt{p_1} \right) - \asin \left( \sqrt{p_2} \right) \right) \]
# Define expected prevalence rates
p1 <- 0.01 # Smallest Cliona prevalence worth detecting
#the lowest prevalence of cliona that has more than 1 observation for that coral species
p2 <- mean(species_cliona_prob$cliona_probability[species_cliona_prob$cliona_probability > 0], na.rm = TRUE)
# Calculate Cohen's h effect size
h_effect_size <- 2 * asin(sqrt(p1)) - 2 * asin(sqrt(p2))
# Run power analysis to find the required sample size per species
power_result <- pwr.p.test(h = abs(h_effect_size), sig.level = 0.05, power = 0.8)
The result of the power analysis suggests that at least 157 observations per species are needed to make a statistically confident estimate of Cliona prevalence.
print(power_result$n)
## [1] 350.2433
# Define threshold based on power analysis
min_observations_threshold <- ceiling(power_result$n)
# Filter out species with too few observations
filtered_species <- species_cliona_prob %>%
filter(total_observations >= ceiling(power_result$n))
# Load site depth data (Assuming depth data is in "SiteMetadata.csv")
site_metadata <- read_csv("TCRMP_SiteMetadata.csv") %>%
select(Location, Depth)
# Filter Cliona data to keep only chosen species
cliona_filtered <- cliona %>%
filter(SPP %in% filtered_species$SPP)
# Summarize by **Location AND Coral Species (SPP)**
location_species_summary <- cliona_filtered %>%
group_by(Location, SPP) %>%
summarise(
Total_Abundance = n(),
Cliona_Presence = sum(presence, na.rm = TRUE),
Total_Surveys = n(),
Cliona_Prevalence = Cliona_Presence / Total_Surveys
) %>%
ungroup()
# Merge Depth Data (Ensuring Each Location Retains Its Depth)
location_species_summary <- location_species_summary %>%
left_join(site_metadata, by = "Location")
ggplot(location_species_summary, aes(x = Depth, y = Cliona_Prevalence, color = SPP)) +
geom_point(size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
labs(title = "Cliona Prevalence vs. Depth by Coral Species",
x = "Depth (m)",
y = "Cliona Prevalence",
color = "Coral Species") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Load site metadata (Depth category and region info)
site_metadata <- read_csv("TCRMP_SiteMetadata.csv") %>%
select(Location, Depth, ReefComplex, Island) # Include ReefComplex and Island
# Merge Cliona data with site metadata
cliona_filtered <- location_species_summary %>%
left_join(site_metadata, by = "Location")
# **Split Data by Region: STT/STJ vs. STX**
cliona_filtered <- cliona_filtered %>%
mutate(ReefComplex_Region = case_when(
Island == "STT" | Island == "STJ" ~ paste0("STT/STJ - ", ReefComplex),
Island == "STX" ~ paste0("STX - ", ReefComplex),
TRUE ~ "Other"
))
deep_sites <- cliona_filtered %>%
filter(ReefComplex %in% c("Nearshore", "Offshore") & Depth.x >= 15) # Adjust depth threshol
# Plot Cliona prevalence by ReefComplex
ggplot(deep_sites, aes(x = ReefComplex, y = Cliona_Prevalence, fill = ReefComplex)) +
geom_boxplot(alpha = 0.6) +
facet_wrap(~Island) + # Separate by region (St. Thomas/St. John vs. St. Croix)
labs(title = "Cliona Prevalence in Nearshore vs. Offshore Deep Sites",
x = "Reef Complex",
y = "Cliona Prevalence") +
theme_minimal()
ggplot(deep_sites, aes(x = ReefComplex_Region, y = Cliona_Prevalence, fill = ReefComplex_Region)) +
geom_boxplot(alpha = 0.6) +
facet_wrap(~SPP, scales = "free_y") + # Separate by coral species
labs(title = "Cliona Prevalence by Reef Complex and Coral Species",
x = "Reef Complex (Region)",
y = "Cliona Prevalence",
fill = "Reef Complex (Region)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Check the dataset
head(deep_sites)
## # A tibble: 6 × 11
## Location SPP Total_Abundance Cliona_Presence Total_Surveys Cliona_Prevalence
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 Buck Is… AA 282 0 282 0
## 2 Buck Is… AG 13 0 13 0
## 3 Buck Is… AGSP 9 0 9 0
## 4 Buck Is… AH 30 0 30 0
## 5 Buck Is… AL 10 1 10 0.1
## 6 Buck Is… MC 32 0 32 0
## # ℹ 5 more variables: Depth.x <dbl>, Depth.y <dbl>, ReefComplex <chr>,
## # Island <chr>, ReefComplex_Region <chr>
ANOVA: Does prevalence differ by reef complex region? F(2, 120) = 10.45, p < 0.001 #There is a statistically significant difference in Cliona prevalence between at least two of the reef complex region groups.
# ANOVA: Does prevalence differ by reef complex region?
anova_result <- aov(Cliona_Prevalence ~ ReefComplex_Region, data = deep_sites)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## ReefComplex_Region 2 0.02478 0.012389 10.45 6.54e-05 ***
## Residuals 120 0.14225 0.001185
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Post-hoc test: Tukey HSD
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Cliona_Prevalence ~ ReefComplex_Region, data = deep_sites)
##
## $ReefComplex_Region
## diff lwr upr
## STX - Nearshore-STT/STJ - Offshore 0.041878078 0.01981597 0.06394019
## STX - Offshore-STT/STJ - Offshore 0.002714257 -0.01416095 0.01958946
## STX - Offshore-STX - Nearshore -0.039163820 -0.06331857 -0.01500907
## p adj
## STX - Nearshore-STT/STJ - Offshore 0.0000460
## STX - Offshore-STT/STJ - Offshore 0.9228867
## STX - Offshore-STX - Nearshore 0.0005610
STX Nearshore sites have significantly higher Cliona prevalence compared to both STX Offshore and STT/STJ Offshore reefs.There is no significant difference between STX Offshore and STT/STJ Offshore, suggesting the offshore zones across islands are more similar in Cliona prevalence.This supports a depth × reef zone interaction hypothesis — perhaps driven by environmental gradients (e.g., light, nutrients, stress) unique to STX nearshore reefs.
# Load necessary libraries
pacman::p_load(ggplot2, dplyr, multcompView, agricolae, tidyr)
# Run ANOVA
aov_result <- aov(Cliona_Prevalence ~ ReefComplex_Region, data = deep_sites)
# Run Tukey HSD and assign group letters
hsd_letters <- HSD.test(aov_result, "ReefComplex_Region", group = TRUE)
# View the groups with assigned letters
print(hsd_letters$groups)
## Cliona_Prevalence groups
## STX - Nearshore 0.05300301 a
## STX - Offshore 0.01383919 b
## STT/STJ - Offshore 0.01112493 b
# Calculate means and standard errors for each group
means <- deep_sites %>%
group_by(ReefComplex_Region) %>%
summarise(
mean = mean(Cliona_Prevalence, na.rm = TRUE),
se = sd(Cliona_Prevalence, na.rm = TRUE) / sqrt(n())
) %>%
# Merge HSD group letters
left_join(
hsd_letters$groups %>% rownames_to_column("ReefComplex_Region"),
by = "ReefComplex_Region"
)
ggplot(means, aes(x = ReefComplex_Region, y = mean, fill = ReefComplex_Region)) + geom_bar(stat = "identity", width = 0.6, alpha = 0.7, color = "black") + geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2) + geom_text(aes(label = groups, y = mean + se + 0.01), vjust = 0, size = 5) + labs(title = "Cliona Prevalence by Reef Complex Region", x = "Reef Complex Region", y = "Mean Cliona Prevalence") + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(size = 16, face = "bold"))
glm_interaction <- glm(Cliona_Prevalence ~ Depth.x * ReefComplex_Region, family = quasibinomial, data = cliona_filtered)
summary(glm_interaction)
##
## Call:
## glm(formula = Cliona_Prevalence ~ Depth.x * ReefComplex_Region,
## family = quasibinomial, data = cliona_filtered)
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -7.419731 3.123971 -2.375
## Depth.x 0.079950 0.084636 0.945
## ReefComplex_RegionSTT/STJ - Nearshore 2.689852 3.323834 0.809
## ReefComplex_RegionSTT/STJ - Offshore 5.474276 3.219854 1.700
## ReefComplex_RegionSTX - Mesophotic -2.824363 4.852053 -0.582
## ReefComplex_RegionSTX - Nearshore 3.243838 3.183047 1.019
## ReefComplex_RegionSTX - Offshore 5.688702 3.161348 1.799
## Depth.x:ReefComplex_RegionSTT/STJ - Nearshore 0.092521 0.168455 0.549
## Depth.x:ReefComplex_RegionSTT/STJ - Offshore -0.218290 0.101113 -2.159
## Depth.x:ReefComplex_RegionSTX - Mesophotic 0.087771 0.137030 0.641
## Depth.x:ReefComplex_RegionSTX - Nearshore -0.004156 0.097509 -0.043
## Depth.x:ReefComplex_RegionSTX - Offshore -0.191551 0.094033 -2.037
## Pr(>|t|)
## (Intercept) 0.0179 *
## Depth.x 0.3453
## ReefComplex_RegionSTT/STJ - Nearshore 0.4187
## ReefComplex_RegionSTT/STJ - Offshore 0.0897 .
## ReefComplex_RegionSTX - Mesophotic 0.5607
## ReefComplex_RegionSTX - Nearshore 0.3086
## ReefComplex_RegionSTX - Offshore 0.0725 .
## Depth.x:ReefComplex_RegionSTT/STJ - Nearshore 0.5831
## Depth.x:ReefComplex_RegionSTT/STJ - Offshore 0.0313 *
## Depth.x:ReefComplex_RegionSTX - Mesophotic 0.5221
## Depth.x:ReefComplex_RegionSTX - Nearshore 0.9660
## Depth.x:ReefComplex_RegionSTX - Offshore 0.0421 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.1060844)
##
## Null deviance: 41.230 on 559 degrees of freedom
## Residual deviance: 35.309 on 548 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 8
In offshore reefs (STT/STJ and STX), Cliona prevalence decreases significantly with depth, as shown by the negative and significant interaction terms.The effect of depth is not significant, meaning Cliona may be more uniformly distributed across depths in those zones. This supports the hypothesis that reef type moderates the impact of depth on Cliona.
pacman::p_load(emmeans, broom)
# Fit interaction model (already done)
glm_interaction <- glm(Cliona_Prevalence ~ Depth.x * ReefComplex_Region, family = quasibinomial, data = cliona_filtered)
# Create a depth sequence
depth_seq <- seq(min(cliona_filtered$Depth.x, na.rm = TRUE),
max(cliona_filtered$Depth.x, na.rm = TRUE),
length.out = 100)
# Create a prediction grid
pred_grid <- expand.grid(
Depth.x = depth_seq,
ReefComplex_Region = unique(cliona_filtered$ReefComplex_Region)
)
# Add predicted values from your interaction model (replace with your actual model name)
pred_grid$predicted <- predict(glm_interaction,
newdata = pred_grid,
type = "response")
ggplot(cliona_filtered, aes(x = Depth.x, y = Cliona_Prevalence, color = ReefComplex_Region)) +
geom_point(alpha = 0.4) +
geom_line(data = pred_grid,
aes(x = Depth.x, y = predicted, color = ReefComplex_Region),
size = 1.2) +
labs(title = "Predicted Cliona Prevalence vs. Depth",
x = "Depth (m)",
y = "Predicted Cliona Prevalence") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
glm_simple <- glm(Cliona_Prevalence ~ Depth.x + ReefComplex_Region, family = quasibinomial, data = cliona_filtered)
anova(glm_simple, glm_interaction, test = "Chisq") # Likelihood ratio test
## Analysis of Deviance Table
##
## Model 1: Cliona_Prevalence ~ Depth.x + ReefComplex_Region
## Model 2: Cliona_Prevalence ~ Depth.x * ReefComplex_Region
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 553 37.514
## 2 548 35.309 5 2.2051 0.000889 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model with the Depth.x * ReefComplex_Region interaction is significantly better than the simpler additive model. This confirms that the effect of depth depends on the reef zone/region, and you were right to include that interaction term.
emm <- emmeans(glm_interaction, ~ ReefComplex_Region | Depth.x, at = list(Depth.x = c(5, 15, 30)))
summary(emm)
## Depth.x = 5:
## ReefComplex_Region emmean SE df asymp.LCL asymp.UCL
## STT/STJ - Mesophotic -7.020 2.700 Inf -12.32 -1.7187
## STT/STJ - Nearshore -3.868 0.429 Inf -4.71 -3.0267
## STT/STJ - Offshore -2.637 0.520 Inf -3.66 -1.6180
## STX - Mesophotic -9.405 3.180 Inf -15.63 -3.1785
## STX - Nearshore -3.797 0.385 Inf -4.55 -3.0425
## STX - Offshore -2.289 0.308 Inf -2.89 -1.6846
##
## Depth.x = 15:
## ReefComplex_Region emmean SE df asymp.LCL asymp.UCL
## STT/STJ - Mesophotic -6.220 1.870 Inf -9.89 -2.5520
## STT/STJ - Nearshore -2.143 1.080 Inf -4.25 -0.0347
## STT/STJ - Offshore -4.021 0.232 Inf -4.48 -3.5659
## STX - Mesophotic -7.728 2.110 Inf -11.87 -3.5904
## STX - Nearshore -3.039 0.226 Inf -3.48 -2.5964
## STX - Offshore -3.405 0.259 Inf -3.91 -2.8965
##
## Depth.x = 30:
## ReefComplex_Region emmean SE df asymp.LCL asymp.UCL
## STT/STJ - Mesophotic -5.021 0.686 Inf -6.37 -3.6762
## STT/STJ - Nearshore 0.444 3.250 Inf -5.93 6.8168
## STT/STJ - Offshore -6.096 0.936 Inf -7.93 -4.2605
## STX - Mesophotic -5.212 0.598 Inf -6.38 -4.0406
## STX - Nearshore -1.902 0.886 Inf -3.64 -0.1661
## STX - Offshore -5.079 0.809 Inf -6.67 -3.4925
##
## Results are given on the logit (not the response) scale.
## Confidence level used: 0.95
emmip(glm_interaction, ReefComplex_Region ~ Depth.x, cov.reduce = range, CIs = TRUE) + labs(title = "Estimated Cliona Prevalence by Reef Complex Region", x = "Depth (m)", y = "Estimated Cliona Prevalence") + theme_minimal()
All values are on the logit scale (log-odds of Cliona prevalence). As depth increases, Cliona prevalence (logit) drops sharply in most reef complexes, particularly STT/STJ - Offshore and STX - Offshore.Some groups (like STT/STJ - Nearshore at 30 m) show very wide CIs, suggesting high uncertainty (likely due to sparse data there).
Back-transforming logits to probability scale
emm_df <- as.data.frame(emm)
emm_df$probability <- plogis(emm_df$emmean)
ggplot(emm_df, aes(x = as.factor(Depth.x), y = probability, fill = ReefComplex_Region)) + geom_col(position = position_dodge(width = 0.8)) + geom_errorbar(aes(ymin = plogis(asymp.LCL), ymax = plogis(asymp.UCL)), width = 0.2, position = position_dodge(width = 0.8)) + labs(title = "Estimated Cliona Prevalence by Depth and Reef Complex Region", x = "Depth (m)", y = "Estimated Prevalence") + theme_minimal() + theme(axis.text.x = element_text(size = 12), plot.title = element_text(hjust = 0.5, face = "bold"))
This plot visualizes estimated Cliona prevalence across different depths (5, 15, and 30 meters) and reef complex regions, showing how sponge prevalence varies with both reef zone and geographic region (STT/STJ vs. STX) Cliona prevalence is generally highest in shallow offshore sites, particularly at 5–15 m. Mesophotic and some nearshore zones have high uncertainty—indicating a need for more targeted sampling in those regions.
# Load depth metadata
site_metadata <- read_csv("TCRMP_SiteMetadata.csv") %>%
select(Location, Depth, ReefComplex, Island)
# Load distance-from-shore metadata
site_impact <- read_csv("site_impact.csv") %>%
select(Location, Distance)
# Merge both metadata sources
site_metadata <- site_metadata %>%
left_join(site_impact, by = "Location") %>%
rename(Distance_from_Shore = Distance)
# Join location_species_summary (species-level prevalence) with site info
species_level_data <- location_species_summary %>%
select(-Depth) %>% # Drop old/conflicting Depth column if it exists
left_join(site_metadata, by = "Location") %>%
filter(ReefComplex %in% c("Nearshore", "Offshore"), Depth >= 15) %>%
mutate(ReefComplex_Region = case_when(
Island %in% c("STT", "STJ") ~ paste0("STT/STJ - ", ReefComplex),
Island == "STX" ~ paste0("STX - ", ReefComplex),
TRUE ~ NA_character_
))
# 1. Depth vs Cliona prevalence (by coral species)
ggplot(species_level_data, aes(x = Depth, y = Cliona_Prevalence, color = SPP)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~SPP, scales = "free_y") +
labs(title = "Depth vs Cliona Prevalence by Coral Species",
x = "Depth (m)", y = "Cliona Prevalence") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# 2. Distance from shore vs Cliona prevalence (by coral species)
ggplot(species_level_data, aes(x = Distance_from_Shore, y = Cliona_Prevalence, color = SPP)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~SPP, scales = "free_y") +
labs(title = "Distance from Shore vs Cliona Prevalence by Coral Species",
x = "Distance from Shore (m)", y = "Cliona Prevalence") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# 3. Depth vs Cliona prevalence (site level)
site_level_data <- species_level_data %>%
group_by(Location, Depth) %>%
summarise(Cliona_Prevalence = mean(Cliona_Prevalence, na.rm = TRUE), .groups = "drop")
ggplot(site_level_data, aes(x = Depth, y = Cliona_Prevalence)) +
geom_point(color = "darkblue", size = 3, alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "black") +
labs(title = "Depth vs Cliona Prevalence by Site",
x = "Depth (m)", y = "Mean Cliona Prevalence") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# 4. Distance from shore vs Cliona prevalence (site level)
site_level_distance <- species_level_data %>%
group_by(Location, Distance_from_Shore) %>%
summarise(Cliona_Prevalence = mean(Cliona_Prevalence, na.rm = TRUE), .groups = "drop")
ggplot(site_level_distance, aes(x = Distance_from_Shore, y = Cliona_Prevalence)) +
geom_point(color = "darkgreen", size = 3, alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "black") +
labs(title = "Distance from Shore vs Cliona Prevalence by Site",
x = "Distance from Shore (m)", y = "Mean Cliona Prevalence") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Group and summarize prevalence for each coral species within each reef complex region
grouped_prevalence <- species_level_data %>%
filter(ReefComplex_Region %in% c("STX - Nearshore", "STX - Offshore", "STT/STJ - Offshore")) %>%
group_by(ReefComplex_Region, SPP) %>%
summarise(mean_prevalence = mean(Cliona_Prevalence, na.rm = TRUE), .groups = "drop")
# Plot the prevalence by coral species and reef complex region
ggplot(grouped_prevalence, aes(x = SPP, y = mean_prevalence, fill = ReefComplex_Region)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.7)) +
labs(title = "Cliona Prevalence by Coral Species and Reef Complex Region",
x = "Coral Species",
y = "Mean Cliona Prevalence",
fill = "Region") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
revert back to Coral Prevalence- Coral Health Data tab, steps 1-5 are identical
#1
cliona <- read_csv("../Cliona.csv")%>%
select(Period,SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, Spo2)%>%
#2
mutate(presence=if_else(Cliona>0 | Spo1ID %in% c("CLSP", "BOSP") | Spo2ID %in% c("CLSP", "BOSP"),1,0))%>%
mutate(presence=replace_na(presence,0))%>%
#3
filter(Period%in% c("Annual","PostBL")&SampleType%in% c("Permanent","permanent")&Method%in% c("intercept","50 cm belt"))%>%
filter(!grepl("A",Transect))%>%
filter(SampleYear>=2005)%>%
#4
mutate(ID=1:nrow(.),
SampleYear=as.factor(SampleYear),
Location=as.factor(Location),
Transect=as.factor(Transect))
#5
transectprev <- cliona %>%
group_by(Location, SampleYear, Transect) %>%
summarise(prev = mean(presence), freq = sum(presence), total = n())%>%
ungroup()%>%
mutate(ID= 1:nrow(.))Read in the TCRMP site metadata csv. This data frame has 8 columns: Location (e.g., Coral Bay), Code (e.g., CRB), Island (e.g., STJ), Latitude (e.g., 18.338), Longitude (e.g., -64.704), ReefComplex (e.g., Nearshore), Depth (e.g., 9), and YearAdded (e.g., 2011)
reef_complex <- read_csv("../SiteMetadata.csv")Next I merge the transectprev data
frame with the reef_complex data frame,
adding reef complex information (e.g., Nearshore, Offshore, Mesophotic)
to the corresponding locations in
transectprev based on the
Location column.
Reef_Types <- transectprev %>%
left_join(reef_complex, by = "Location")This code groups the Reef_Types
data frame by Location and
ReefComplex, calculates the mean
prevalence (prev) for each group while
ignoring missing values, and then converts
ReefComplex into a factor with the levels
ordered as “Nearshore,” “Offshore,” and
“Mesophotic.”
reef_sum <- Reef_Types %>%
group_by(Location, ReefComplex) %>%
summarise(
mean_prevalence = mean(prev, na.rm = TRUE) # Calculate mean prevalence
)%>%
mutate( ReefComplex = factor(ReefComplex, levels = c("Nearshore", "Offshore", "Mesophotic")))This code generates a boxplot to visualize the mean prevalence of
Cliona sponges across different reef complexes using the
reef_sum dataset. It maps ReefComplex to the
x-axis, mean_prevalence to the y-axis, and uses colors to
distinguish reef complexes.
ggplot(reef_sum, aes(x = ReefComplex, y = mean_prevalence, fill = ReefComplex)) +
geom_boxplot(outlier.color = "red", outlier.shape = 16) +
geom_jitter(width = 0.2, size = 2, alpha = 0.6) + # Add jitter for data points
labs(
title = "Cliona Sponge Prevalence Across Reef Complexes",
x = "Reef Complex",
y = "Mean Prevalence"
) +
theme_minimal() +
theme(
legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1)
)
Next we run a one-way Analysis of Variance (ANOVA) to test whether there is a statistically significant difference in mean Cliona sponge prevalence across different ReefComplex categories (Nearshore, Offshore, and Mesophotic).
# Run ANOVA
anova_result <- aov(mean_prevalence ~ ReefComplex, data = reef_sum)
The ANOVA results indicate a statistically significant difference in the mean prevalence of Cliona sponge coverage across the three reef complexes (Nearshore, Offshore, and Mesophotic). The ReefComplex factor accounts for a Sum of Squares (Sum Sq) of 0.005884, while the residual variation (unexplained variance) accounts for 0.013323. The Mean Square (Mean Sq) for ReefComplex is 0.0029420, and the F-statistic is 6.625, with an associated p-value (Pr(>F)) of 0.00414. This p-value is below the conventional threshold of 0.05, suggesting that the differences in Cliona prevalence among the reef complexes are statistically significant.
# Create a ANOVA table
broom::tidy(anova_result) %>%
gt() %>%
tab_header(
title = "ANOVA Results for Cliona Sponge Prevalence Across Reef Complexes"
)
| ANOVA Results for Cliona Sponge Prevalence Across Reef Complexes | |||||
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| ReefComplex | 2 | 0.005876101 | 0.002938051 | 6.605047 | 0.004197984 |
| Residuals | 30 | 0.013344571 | 0.000444819 | NA | NA |
Next step is the TukeyHSD (Tukey’s Honest Significant Differences) test which is a post-hoc analysis used to identify which specific group pairs have significant differences in their means. It adjusts for multiple comparisons to control the family-wise error rate, ensuring that conclusions about group differences are not overstated.
tukey_result <- TukeyHSD(anova_result)
The TukeyHSD test shows a significant difference in Cliona sponge prevalence between Nearshore and Mesophotic reefs, with Nearshore having a higher mean prevalence (difference = 0.0331, p = 0.003). No significant differences were found between Offshore and Mesophotic reefs (p = 0.207) or Offshore and Nearshore reefs (p = 0.151). These results indicate that Nearshore reefs have notably higher prevalence compared to Mesophotic reefs, while other comparisons show no meaningful differences.
# Create a TukeyHSD table
broom::tidy(tukey_result) %>%
gt() %>%
tab_header(
title = "TukeyHSD Results for Cliona Sponge Prevalence Across Reef Complexes"
)
| TukeyHSD Results for Cliona Sponge Prevalence Across Reef Complexes | ||||||
| term | contrast | null.value | estimate | conf.low | conf.high | adj.p.value |
|---|---|---|---|---|---|---|
| ReefComplex | Offshore-Nearshore | 0 | -0.01635689 | -0.03765759 | 0.004943818 | 0.158150374 |
| ReefComplex | Mesophotic-Nearshore | 0 | -0.03307402 | -0.05562030 | -0.010527744 | 0.003020255 |
| ReefComplex | Mesophotic-Offshore | 0 | -0.01671713 | -0.04008687 | 0.006652605 | 0.199058088 |
We still need to check for normality of residuals and homogeneity of variances. For normality a Shapiro-Wilk statistic will be ran, and for homogeneity of variances an Levene’s test will be ran.
The Shapiro-Wilk test produced a W statistic of 0.965 and a p-value of 0.3555. Since the p-value is greater than the conventional threshold of 0.05, we fail to reject the null hypothesis that the residuals are normally distributed. This indicates that the assumption of normality for the ANOVA model’s residuals is satisfied, supporting the validity of the ANOVA results.
shapiro<-shapiro.test(residuals(anova_result))
# Create a shapiro table
broom::tidy(shapiro) %>%
gt() %>%
tab_header(
title = "Shapiro-Wilk Results for Cliona Sponge Prevalence Across Reef Complexes"
)
| Shapiro-Wilk Results for Cliona Sponge Prevalence Across Reef Complexes | ||
| statistic | p.value | method |
|---|---|---|
| 0.9635491 | 0.3246365 | Shapiro-Wilk normality test |
The Q-Q plot shows that the residuals align closely with the red reference line, indicating that they are approximately normally distributed. The histogram demonstrates that the residuals are symmetrically distributed around zero. And the density plot compares the residuals’ distribution (blue line) to a theoretical normal curve (red dashed line), showing strong alignment with minor deviations. These plots confirm that the residuals satisfy the normality assumption required for ANOVA, supporting the validity of the statistical analysis.
# Generate individual plots
qq_plot <- ggplot(data.frame(residuals = residuals(anova_result)), aes(sample = residuals)) +
stat_qq() + stat_qq_line(color = "red") +
labs(title = "Q-Q Plot of Residuals")
hist_plot <- ggplot(data.frame(residuals = residuals(anova_result)), aes(x = residuals)) +
geom_histogram(binwidth = 0.005, fill = "lightblue", color = "black") +
labs(title = "Histogram of Residuals", x = "Residuals", y = "Frequency") +
theme_minimal()
density_plot <- ggplot(data.frame(residuals = residuals(anova_result)), aes(x = residuals)) +
geom_density(color = "blue", fill = "lightblue", alpha = 0.5) +
stat_function(fun = dnorm, args = list(mean = mean(residuals(anova_result)),
sd = sd(residuals(anova_result))),
color = "red", linetype = "dashed") +
labs(title = "Density Plot with Normal Curve", x = "Residuals", y = "Density")
# Arrange plots
grid.arrange(qq_plot, hist_plot, density_plot, nrow = 1)
The Levene’s test results show an F-value of 2.3661 with a p-value of 0.1111. Since the p-value is greater than the threshold of 0.05, we fail to reject the null hypothesis that the group variances are equal. This indicates that the assumption of homogeneity of variance is satisfied, supporting the appropriateness of using ANOVA for this data.
levene<-leveneTest(mean_prevalence ~ ReefComplex, data = reef_sum)
# Create a Levene table
broom::tidy(levene) %>%
gt() %>%
tab_header(
title = "Levene's Test Results for Cliona Sponge Prevalence Across Reef Complexes"
)
| Levene's Test Results for Cliona Sponge Prevalence Across Reef Complexes | |||
| statistic | p.value | df | df.residual |
|---|---|---|---|
| 2.550193 | 0.09487255 | 2 | 30 |
Finally, we need to visualize the original mean prevalence
between reef complexes but now with TukeyHSD multcompletters. First, the
TukeyHSD function is applied to the ANOVA
model to compute pairwise comparisons of means for the “ReefComplex”
variable, identifying which groups differ significantly. The
multcompView::multcompLetters function
then assigns grouping letters (e.g., “a”, “b”) based on adjusted
p-values, indicating which groups are statistically similar or
different. These labels are converted into a data frame, with the
“ReefComplex” levels and their respective group letters for easy
handling. Finally, the left_join function
integrates the Tukey test results (group letters) into the
reef_sum dataset.
####visualise the resutls#######
# Extract Tukey test results
tukey_result <- TukeyHSD(anova_result, "ReefComplex")
# Convert Tukey results to group letters
library(multcompView)
tukey_labels <- multcompLetters(tukey_result$ReefComplex[, "p adj"])$Letters
# Convert to a data frame for merging
tukey_labels <- data.frame(
ReefComplex = names(tukey_labels),
group = tukey_labels
)
# Merge tukey_labels with your reef_sum data
reef_sum <- reef_sum %>%
left_join(tukey_labels, by = "ReefComplex")This code generates a boxplot comparing mean Cliona sponge prevalence across three reef complexes: Mesophotic, Nearshore, and Offshore, with statistical group labels from Tukey’s post-hoc test. The boxplot shows the median prevalence (horizontal line), spread (interquartile range and whiskers), and outliers (red dots). Statistical group labels (“a”, “b”, “ab”) indicate significant differences: Mesophotic reefs have significantly lower prevalence than Nearshore (p = 0.003), while Offshore overlaps with both. Nearshore reefs have the highest prevalence and variability, Mesophotic the lowest with narrow spread, and Offshore shows intermediate prevalence. This visualization highlights depth-related differences in sponge prevalence, suggesting ecological patterns tied to reef complex characteristics.
# Plot with Tukey labels
ggplot(reef_sum, aes(x = ReefComplex, y = mean_prevalence, fill = ReefComplex)) +
geom_boxplot(outlier.color = "red", outlier.shape = 16) +
geom_jitter(width = 0.2, size = 2, alpha = 0.6) +
geom_text(
aes(label = group, x = ReefComplex, y = max(mean_prevalence) + 0.01),
data = reef_sum, inherit.aes = FALSE, color = "black", size = 4, vjust = -0.5
) +
labs(
title = "Cliona Sponge Prevalence Across Reef Complexes",
x = "Reef Complex",
y = "Mean Prevalence"
) +
theme_minimal() +
theme(
legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1)
)
The data is read into R using read_csv, and then
selected columns are extracted, specifically: Period,
SampleType, Location, SampleYear,
Method, Transect, SPP,
Cliona, Spo1, Spo1ID,
Spo2ID, and Spo2. This selection reduces the
dataset to only the columns relevant to this analysis.
the code generates two new columns, CLSP and
BOSP, which indicate coverage values for two types of
sponges. Using the mutate and case_when
functions, it examines the columns Spo1ID and
Spo2ID to populate these new columns. If
Spo1ID contains “CLSP”, then the value of Spo1
is assigned to the CLSP column. Similarly, if
Spo2ID contains “CLSP”, then the value from
Spo2 is assigned to CLSP. The same process is
used to populate the BOSP column, using “BOSP” as the
indicator. If neither condition is met in either case, the value is set
to NA.
The code then filters the data based on several criteria. First,
it keeps only rows where Period is either “Annual” or
“PostBL” and SampleType is either “Permanent” or
“permanent”, ensuring consistency in sample type naming. Also, only rows
where Method is either “intercept” or “50 cm belt” are
retained, narrowing the dataset to specific sampling methods. And rows
with certain letters in the Transect column (“A”, “R”, or
“L”) are excluded using a filtering condition with grepl,
and only data from 2005 onward (SampleYear >= 2005) is
kept.
Finally, the data is reshaped from a wide format to a long format
using pivot_longer. This transformation takes the columns
CLSP, BOSP, and Cliona and
combines them into two new columns: SpongeType and
Extent. The SpongeType column identifies which
type of sponge coverage is being recorded (either CLSP,
BOSP, or Cliona), and Extent
contains the corresponding value of coverage for each type.
#1.
clionaext <- read_csv("../Cliona.csv")%>%
select(Period,SampleType, Location, SampleYear, Method, Transect, SPP, Cliona, Spo1, Spo1ID, Spo2ID, Spo2)%>%
#2.
mutate( CLSP = case_when( Spo1ID == "CLSP" ~ Spo1, Spo2ID == "CLSP" ~ Spo2, TRUE ~ NA_real_),
BOSP = case_when( Spo1ID == "BOSP" ~ Spo1, Spo2ID == "BOSP" ~ Spo2,
TRUE ~ NA_real_)) %>%
select(-Spo1, -Spo1ID, -Spo2, -Spo2ID)%>%
#3.
filter(Period%in% c("Annual","PostBL")&SampleType%in% c("Permanent","permanent")&Method%in% c("intercept","50 cm belt"))%>%
filter(!grepl("A", "R", "L",Transect))%>%
filter(SampleYear>=2005)%>%
mutate(ID=1:nrow(.),
SampleYear=as.factor(SampleYear),
Location=as.factor(Location),
Transect=as.factor(Transect))%>%
#4.
pivot_longer(cols = c(CLSP, BOSP, Cliona),
names_to = "SpongeType",
values_to = "Extent")
For the first analysis, I ask the question: Has the extent of Cliona sponge coverage increased or decreased over time? Is there a pattern in prevalence by year?
This code assigns the clionaext data frame to a new
data frame named ext_time. It then updates
ext_time by converting the SampleYear column
from a factor to a numeric data type, ensuring that the year values are
treated as continuous numbers rather than categorical labels.
ext_time<-clionaext
ext_time <- ext_time %>%
mutate(SampleYear = as.numeric(as.character(SampleYear)))This code processes the clionaext data frame to
calculate the total number of unique corals recorded in each year.
First, it groups the data by SampleYear and uses
n_distinct(ID) to count distinct ID values,
representing unique coral observations. The resulting data frame,
unique_coral_counts, contains SampleYear and
total_unique_corals.
Next, the SampleYear column is converted from a factor
to numeric using mutate. This ensures the year values are
treated as continuous numbers for analyses or visualizations that
require numerical formatting.
unique_coral_counts <- clionaext %>%
group_by(SampleYear) %>%
summarise(total_unique_corals = n_distinct(ID))
unique_coral_counts <- unique_coral_counts%>%
mutate(SampleYear = as.numeric(as.character(SampleYear)))
ext_time_with_counts, by merging (left_join)
the ext_time data frame with the
unique_coral_counts data frame. The join is performed using
the SampleYear column, which is common to both data
frames.ext_time_with_counts <- ext_time %>%
left_join(unique_coral_counts, by = "SampleYear")
ext_time_with_counts data frame
to summarize sponge coverage by year. It groups the data by
SampleYear and calculates the total extent of sponge
coverage (total_extent), retrieves the total unique coral
count for each year (count_observations), and computes the
weighted average extent (weighted_avg_extent) by dividing
the total extent by the unique coral count.weighted_summary <- ext_time_with_counts %>%
group_by(SampleYear) %>%
summarise(
total_extent = sum(Extent, na.rm = TRUE),
count_observations = first(total_unique_corals), # Use `first` to avoid altering the count
weighted_avg_extent = total_extent / count_observations
)
weighted_summary data frame to visualize the weighted
average extent of Cliona sponge coverage over time. It maps
SampleYear to the x-axis and
weighted_avg_extent to the y-axis, adding a blue line and
points to highlight the data, along with a minimal theme and descriptive
labels for the title, x-axis, and y-axis.ggplot(weighted_summary, aes(x = SampleYear, y = weighted_avg_extent)) +
geom_line(color = "blue") +
geom_point(size = 2) +
labs(
title = "Weighted Average Cliona Sponge Coverage Over Time",
x = "Year",
y = "Weighted Average Extent of Coverage"
) +
theme_minimal()
Next I ask the question: How does the extent of coverage differ between coral species each year? Are there species that experience fluctuating or consistent coverage levels over time?
spp_summary, by
grouping the ext_time_with_counts data frame by
SampleYear and SPP (coral species). For each
group, it calculates the total sponge coverage extent
(total_extent), retrieves the total unique coral count for
the year (count_observations), and computes the weighted
average sponge extent (weighted_avg_extent) by dividing the
total extent by the unique coral count. This provides a standardized
metric of sponge coverage for each coral species across years.spp_summary <- ext_time_with_counts %>%
group_by(SampleYear, SPP) %>%
summarise(
total_extent = sum(Extent, na.rm = TRUE),
count_observations = first(total_unique_corals),
weighted_avg_extent = total_extent / count_observations
)
spp_summary
data frame to visualize the weighted average extent of Cliona sponge
coverage over time for each coral species (SPP). The x-axis
represents SampleYear, the y-axis represents
weighted_avg_extent, and different coral species are
distinguished by color.ggplot(spp_summary, aes(x = SampleYear, y = weighted_avg_extent, color = SPP)) +
geom_line() +
geom_point() +
labs(
title = "Weighted Average Cliona Sponge Coverage by Coral Species Over Time",
x = "Year",
y = "Weighted Average Extent of Coverage"
) +
theme_minimal()
Spencer Parr
The Ivlev’s Electivity Index (Ei) is a quantitative measure used to determine an organism’s preference for certain food or habitat types relative to their availability. In this thesis, it is applied to assess the Cliona delitrix sponge’s substrate preferences on coral reefs. Specifically, the index compares the proportion of coral species colonized by Cliona to the proportion of those species available in the environment. A positive Ei value indicates a preference, while negative values suggest avoidance. This helps in understanding how Cliona sponges select coral species under different environmental conditions.
Electivity indices measure the utilization of food types (r) in relation to their abundance or availability in the environment (p). Foods that constitute a larger proportion of the diet than of the available foods are considered preferred; conversely those proportionately underrepresented in the diet are avoided.
While Ivlev’s Electivity Index is typically used for dietary preferences, it can still be effectively applied to Cliona delitrix sponges in this thesis, despite corals not being a food source. Cliona sponges exhibit substrate preferences for certain coral species based on factors such as calcium carbonate concentration. Some stony corals have denser calcium carbonate skeletons, which may be harder for Cliona to excavate and colonize. Thus, the sponge’s choice is driven by its ability to grow and obtain necessary nutrients, similar to dietary selectivity in food webs.
# Calculate coral abundance by transect and year
coral_abundance <- cliona %>%
group_by(SPP, Location, Transect, SampleYear) %>%
summarise(Abundance = n()) %>%
ungroup()
clionaclean1 <- cliona %>%
left_join(coral_abundance, by = c("SPP", "Location", "Transect", "SampleYear"))
SPP). The code creates a summary of each coral species,
providing an overall count of how often each species was present and its
total abundance across the entire dataset. This allows for a better
understanding of the relative presence and prevalence of different coral
species in the study area.# Summarize data to calculate total presence and abundance of each coral species
summary_cliona <- clionaclean1 %>%
group_by(SPP) %>%
summarise(TotalPresence = sum(presence),
TotalAbundance = sum(Abundance)) %>% # Adjust for coral abundance
ungroup()
# Calculate proportions and then Ivlev's Electivity Index adjusted for coral abundance
Ivlev_Index <- summary_cliona %>%
mutate(ProportionPresence = TotalPresence / sum(TotalPresence),
ProportionAvailability = TotalAbundance / sum(TotalAbundance), # Use abundance
IvlevIndex = (ProportionPresence - ProportionAvailability) /
(ProportionPresence + ProportionAvailability))
Ivlev_Index %>%
kable(caption = "Ivlev's Electivity Index Adjusted for Coral Abundance", digits = 4)
| SPP | TotalPresence | TotalAbundance | ProportionPresence | ProportionAvailability | IvlevIndex |
|---|---|---|---|---|---|
| AA | 72 | 41794 | 0.0378 | 0.0618 | -0.2408 |
| AC | 0 | 21 | 0.0000 | 0.0000 | -1.0000 |
| AF | 0 | 229 | 0.0000 | 0.0003 | -1.0000 |
| AG | 9 | 1142 | 0.0047 | 0.0017 | 0.4736 |
| AGSP | 3 | 17256 | 0.0016 | 0.0255 | -0.8837 |
| AH | 9 | 10869 | 0.0047 | 0.0161 | -0.5455 |
| AL | 29 | 9780 | 0.0152 | 0.0145 | 0.0259 |
| AP | 0 | 4 | 0.0000 | 0.0000 | -1.0000 |
| AT | 0 | 13 | 0.0000 | 0.0000 | -1.0000 |
| AU | 0 | 41 | 0.0000 | 0.0001 | -1.0000 |
| CN | 7 | 403 | 0.0037 | 0.0006 | 0.7210 |
| DCY | 0 | 60 | 0.0000 | 0.0001 | -1.0000 |
| DL | 7 | 365 | 0.0037 | 0.0005 | 0.7440 |
| DSO | 1 | 58 | 0.0005 | 0.0001 | 0.7193 |
| EF | 0 | 232 | 0.0000 | 0.0003 | -1.0000 |
| FF | 0 | 17 | 0.0000 | 0.0000 | -1.0000 |
| HC | 0 | 1355 | 0.0000 | 0.0020 | -1.0000 |
| IR | 0 | 10 | 0.0000 | 0.0000 | -1.0000 |
| IS | 0 | 4 | 0.0000 | 0.0000 | -1.0000 |
| MAFO | 0 | 62 | 0.0000 | 0.0001 | -1.0000 |
| MAL | 0 | 15 | 0.0000 | 0.0000 | -1.0000 |
| MAN | 0 | 6 | 0.0000 | 0.0000 | -1.0000 |
| MAR | 0 | 25 | 0.0000 | 0.0000 | -1.0000 |
| MC | 87 | 7315 | 0.0457 | 0.0108 | 0.6172 |
| MD | 11 | 2743 | 0.0058 | 0.0041 | 0.1751 |
| MDA | 0 | 3 | 0.0000 | 0.0000 | -1.0000 |
| MDSP | 0 | 33 | 0.0000 | 0.0000 | -1.0000 |
| MF | 1 | 24 | 0.0005 | 0.0000 | 0.8734 |
| MILA | 8 | 4438 | 0.0042 | 0.0066 | -0.2193 |
| MILC | 0 | 702 | 0.0000 | 0.0010 | -1.0000 |
| MILSP | 0 | 4 | 0.0000 | 0.0000 | -1.0000 |
| ML | 0 | 9 | 0.0000 | 0.0000 | -1.0000 |
| MM | 0 | 317 | 0.0000 | 0.0005 | -1.0000 |
| MME | 0 | 793 | 0.0000 | 0.0012 | -1.0000 |
| MP | 0 | 51 | 0.0000 | 0.0001 | -1.0000 |
| MYSP | 0 | 76 | 0.0000 | 0.0001 | -1.0000 |
| OA | 396 | 26709 | 0.2080 | 0.0395 | 0.6808 |
| OFAV | 81 | 5270 | 0.0425 | 0.0078 | 0.6904 |
| OFRA | 166 | 78250 | 0.0872 | 0.1157 | -0.1406 |
| OX | 107 | 38927 | 0.0562 | 0.0576 | -0.0120 |
| PA | 304 | 224707 | 0.1597 | 0.3323 | -0.3509 |
| PB | 0 | 1 | 0.0000 | 0.0000 | -1.0000 |
| PBSP | 2 | 1460 | 0.0011 | 0.0022 | -0.3454 |
| PC | 0 | 116 | 0.0000 | 0.0002 | -1.0000 |
| PCL | 4 | 91 | 0.0021 | 0.0001 | 0.8796 |
| PD | 0 | 413 | 0.0000 | 0.0006 | -1.0000 |
| PF | 1 | 435 | 0.0005 | 0.0006 | -0.1010 |
| PP | 52 | 107932 | 0.0273 | 0.1596 | -0.7078 |
| PS | 26 | 1128 | 0.0137 | 0.0017 | 0.7823 |
| SB | 0 | 8 | 0.0000 | 0.0000 | -1.0000 |
| SC | 0 | 48 | 0.0000 | 0.0001 | -1.0000 |
| SCSP | 0 | 53 | 0.0000 | 0.0001 | -1.0000 |
| SI | 76 | 3601 | 0.0399 | 0.0053 | 0.7646 |
| SL | 0 | 1 | 0.0000 | 0.0000 | -1.0000 |
| SR | 1 | 5785 | 0.0005 | 0.0086 | -0.8843 |
| SS | 444 | 81086 | 0.2332 | 0.1199 | 0.3209 |
| SSPP | 0 | 1 | 0.0000 | 0.0000 | -1.0000 |
| TC | 0 | 1 | 0.0000 | 0.0000 | -1.0000 |
| UNK | 0 | 1 | 0.0000 | 0.0000 | -1.0000 |
ggplot(Ivlev_Index, aes(x = reorder(SPP, IvlevIndex), y = IvlevIndex, fill = IvlevIndex > 0)) +
geom_bar(stat = "identity", width = 0.8) +
labs(title = "Ivlev's Electivity Index for Cliona on Different Coral Species",
x = "Coral Species",
y = "Ivlev's Electivity Index") +
theme_minimal(base_size = 14) +
scale_fill_manual(values = c("red", "steelblue"),
name = "Preference",
labels = c("Avoidance", "Preference")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 10), # Adjust angle and font size
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"), # Center and bold title
legend.position = "top") + # Move legend to the top
geom_hline(yintercept = 0, linetype = "dashed", color = "black", size = 0.5) # Add horizontal line at 0
The bar graph shows Ivlev’s Electivity Index for Cliona sponge prevalence on various coral species, quantifying the degree of preference (positive values) or avoidance (negative values) exhibited by Cliona. Coral species are displayed along the x-axis, with the y-axis representing the electivity index, ranging from -1 (complete avoidance) to 1 (complete preference). Most coral species show a strong avoidance by Cliona, evidenced by a majority of red bars with negative values. Notably, species such as MFCL, PS, and DL exhibit the highest positive electivity indices, indicating a clear preference for colonization. This suggests that these coral species may provide favorable conditions for Cliona, such as structural complexity or susceptibility to colonization. Conversely, species such as AC, AF, and AT demonstrate the most significant avoidance, with values near -1. This information suggests that there is selective colonization behavior of Cliona, which may have important implications for understanding its ecological interactions and impacts on coral reef ecosystems.
Spencer Parr